AITopics | generation accuracy

Collaborating Authors

generation accuracy

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

52cf49fea5ff66588408852f65cf8272-AuthorFeedback.pdf

Neural Information Processing SystemsFeb-8-2026, 10:56:43 GMT

accuracy, generation accuracy, user study, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.72)

Add feedback

52cf49fea5ff66588408852f65cf8272-AuthorFeedback.pdf

Neural Information Processing SystemsOct-2-2025, 22:37:56 GMT

As noted in point 1 above CoPINet's success is partly due to the ability to answer RA VEN questions without

accuracy, artificial intelligence, user study, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.72)

Add feedback

The Lookahead Limitation: Why Multi-Operand Addition is Hard for LLMs

Baeumel, Tanja, van Genabith, Josef, Ostermann, Simon

arXiv.org Artificial IntelligenceFeb-27-2025

Autoregressive large language models (LLMs) exhibit impressive performance across various tasks but struggle with simple arithmetic, such as addition of two or more operands. We show that this struggle arises from LLMs' use of a simple one-digit lookahead heuristic, which works fairly well (but not perfect) for two-operand addition but fails in multi-operand cases, where the carry-over logic is more complex. Our probing experiments and digit-wise accuracy evaluation show that LLMs fail precisely where a one-digit lookahead is insufficient to account for cascading carries. We analyze the impact of tokenization strategies on arithmetic performance and show that all investigated models, regardless of tokenization, are inherently limited in the addition of multiple operands due to their reliance on a one-digit lookahead heuristic. Our findings reveal fundamental limitations that prevent LLMs from generalizing to more complex numerical reasoning.

accuracy, digit, lookahead, (15 more...)

arXiv.org Artificial Intelligence

2502.19981

Country:

Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
Europe > Germany > Saarland > Saarbrücken (0.04)
Asia > Middle East > Saudi Arabia > Asir Province > Abha (0.04)
Asia > British Indian Ocean Territory > Diego Garcia (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Accelerating Retrieval-Augmented Generation

Quinn, Derrick, Nouri, Mohammad, Patel, Neel, Salihu, John, Salemi, Alireza, Lee, Sukhan, Zamani, Hamed, Alian, Mohammad

arXiv.org Artificial IntelligenceDec-14-2024

An evolving solution to address hallucination and enhance accuracy in large language models (LLMs) is Retrieval-Augmented Generation (RAG), which involves augmenting LLMs with information retrieved from an external knowledge source, such as the web. This paper profiles several RAG execution pipelines and demystifies the complex interplay between their retrieval and generation phases. We demonstrate that while exact retrieval schemes are expensive, they can reduce inference time compared to approximate retrieval variants because an exact retrieval model can send a smaller but more accurate list of documents to the generative model while maintaining the same end-to-end accuracy. This observation motivates the acceleration of the exact nearest neighbor search for RAG. In this work, we design Intelligent Knowledge Store (IKS), a type-2 CXL device that implements a scale-out near-memory acceleration architecture with a novel cache-coherent interface between the host CPU and near-memory accelerators. IKS offers 13.4-27.9x faster exact nearest neighbor search over a 512GB vector database compared with executing the search on Intel Sapphire Rapids CPUs. This higher search performance translates to 1.7-26.3x lower end-to-end inference time for representative RAG applications. IKS is inherently a memory expander; its internal DRAM can be disaggregated and used for other applications running on the server to prevent DRAM, which is the most expensive component in today's servers, from being stranded.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2412.15246

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
North America > United States > New York > New York County > New York City (0.05)
(25 more...)

Genre: Research Report (0.82)

Industry: Information Technology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

HiVeGen -- Hierarchical LLM-based Verilog Generation for Scalable Chip Design

Tang, Jinwei, Qin, Jiayin, Thorat, Kiran, Zhu-Tian, Chen, Cao, Yu, Yang, null, Zhao, null, Ding, Caiwen

arXiv.org Artificial IntelligenceDec-6-2024

With Large Language Models (LLMs) recently demonstrating impressive proficiency in code generation, it is promising to extend their abilities to Hardware Description Language (HDL). However, LLMs tend to generate single HDL code blocks rather than hierarchical structures for hardware designs, leading to hallucinations, particularly in complex designs like Domain-Specific Accelerators (DSAs). To address this, we propose HiVeGen, a hierarchical LLM-based Verilog generation framework that decomposes generation tasks into LLM-manageable hierarchical submodules. HiVeGen further harnesses the advantages of such hierarchical structures by integrating automatic Design Space Exploration (DSE) into hierarchy-aware prompt generation, introducing weight-based retrieval to enhance code reuse, and enabling real-time human-computer interaction to lower error-correction cost, significantly improving the quality of generated designs.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2412.05393

Country:

Europe (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Minnesota (0.04)
North America > United States > Connecticut > Tolland County > Storrs (0.04)

Genre: Research Report (0.50)

Industry: Semiconductors & Electronics (0.51)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Personality-Guided Code Generation Using Large Language Models

Guo, Yaoqi, Chen, Zhenpeng, Zhang, Jie M., Liu, Yang, Ma, Yun

arXiv.org Artificial IntelligenceOct-16-2024

Code generation, the automatic creation of source code from natural language descriptions, has garnered significant attention due to its potential to streamline software development. Inspired by research that links task-personality alignment with improved development outcomes, we conduct an empirical study on personality-guided code generation using large language models (LLMs). Specifically, we investigate how emulating personality traits appropriate to the coding tasks affects LLM performance. We extensively evaluate this approach using seven widely adopted LLMs across four representative datasets. Our results show that personality guidance significantly enhances code generation accuracy, with improved pass rates in 23 out of 28 LLM-dataset combinations. Notably, in 11 cases, the improvement exceeds 5%, and in 5 instances, it surpasses 10%, with the highest gain reaching 12.9%. Additionally, personality guidance can be easily integrated with other prompting strategies to further boost performance.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2411.00006

Country:

Europe > United Kingdom > England > Greater London > London (0.04)
Asia > Singapore (0.04)
Asia > China (0.04)
Africa (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.80)

Add feedback

Insights into LLM Long-Context Failures: When Transformers Know but Don't Tell

Lu, Taiming, Gao, Muhan, Yu, Kuai, Byerly, Adam, Khashabi, Daniel

arXiv.org Artificial IntelligenceJun-20-2024

Large Language Models (LLMs) exhibit positional bias, struggling to utilize information from the middle or end of long contexts. Our study explores LLMs' long-context reasoning by probing their hidden representations. We find that while LLMs encode the position of target information, they often fail to leverage this in generating accurate responses. This reveals a disconnect between information retrieval and utilization, a "know but don't tell" phenomenon. We further analyze the relationship between extraction time and final accuracy, offering insights into the underlying mechanics of transformer models.

accuracy, classifier, information, (16 more...)

arXiv.org Artificial Intelligence

2406.14673

Country:

Europe > Germany (0.04)
Asia > Singapore (0.04)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)

Genre: Research Report > New Finding (0.47)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Enhancing Accuracy in Generative Models via Knowledge Transfer

Tian, Xinyu, Shen, Xiaotong

arXiv.org Machine LearningMay-27-2024

This paper investigates the accuracy of generative models and the impact of knowledge transfer on their generation precision. Specifically, we examine a generative model for a target task, fine-tuned using a pre-trained model from a source task. Building on the "Shared Embedding" concept, which bridges the source and target tasks, we introduce a novel framework for transfer learning under distribution metrics such as the Kullback-Leibler divergence. This framework underscores the importance of leveraging inherent similarities between diverse tasks despite their distinct data distributions. Our theory suggests that the shared structures can augment the generation accuracy for a target task, reliant on the capability of a source model to identify shared structures and effective knowledge transfer from source to target learning. To demonstrate the practical utility of this framework, we explore the theoretical implications for two specific generative models: diffusion and normalizing flows. The results show enhanced performance in both models over their non-transfer counterparts, indicating advancements for diffusion models and providing fresh insights into normalizing flows in transfer and non-transfer settings.

approximation error, diffusion model, generation error, (16 more...)

arXiv.org Machine Learning

2405.16837

Country:

North America > United States > Minnesota (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.65)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Generation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

KRLS: Improving End-to-End Response Generation in Task Oriented Dialog with Reinforced Keywords Learning

Yu, Xiao, Wu, Qingyang, Qian, Kun, Yu, Zhou

arXiv.org Artificial IntelligenceOct-19-2023

In task-oriented dialogs (TOD), reinforcement learning (RL) algorithms train a model to directly optimize response for task-related metrics. However, RL needs to perform exploration, which can be time-consuming due to the slow auto-regressive sequence generation process. We investigate an approach to create a more efficient RL-based algorithm to improve TOD performance in an offline setting. First, we use a faster generation procedure that samples from independent next-word distributions after training the language model (LM) with supervised learning. We then introduce a fine-grained reward function to help the model focus on learning key information in a dialog, by measuring the importance and semantic closeness of each generated token. Experiments on the MultiWoZ dataset show our new training algorithm, Keywords Reinforcement Learning with Next-word Sampling (KRLS), achieves state-of-the-art performance on the end-to-end response generation task, with a 15% training time reduction compared to a standard RL algorithm using auto-regressive generation.

algorithm, computational linguistic, krl, (15 more...)

arXiv.org Artificial Intelligence

2211.16773

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Dominican Republic (0.04)
Asia > Middle East > Jordan (0.04)
(9 more...)

Genre: Research Report (0.82)

Industry: Materials > Metals & Mining > Gold (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.69)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.66)

Add feedback

Exploring Equation as a Better Intermediate Meaning Representation for Numerical Reasoning

Wang, Dingzirui, Dou, Longxu, Zhang, Wenbin, Zeng, Junyu, Che, Wanxiang

arXiv.org Artificial IntelligenceAug-21-2023

Numerical reasoning is vital for natural language processing models to understand and process numerical information in real-world scenarios. Most current methods first generate the Intermediate Meaning Representations (IMRs) of questions and then generate answers. Current SOTA methods generate programs as IMRs with large language models (LLMs). Intuitively, equations have fewer restrictions and closer semantics to the question than programs, leading to higher generation accuracy. However, current LLMs generate equations worse than programs, where we assume that the equation data is rare in pre-training data compared to programs. So in this paper, we try to use equations as IMRs to solve the numerical reasoning task by addressing two problems: (1) Theoretically, how to prove that the equation is an IMR with higher generation accuracy than programs; (2) Empirically, how to improve the generation accuracy of equations with LLMs. For the first problem, we propose and prove a proposition to theoretically compare the generation accuracy of different IMRs. For the second problem, we present a method called Boosting Numerical Reason\textbfing by Decomposing the Generation of Equations (Bridge), which can improve the accuracy of LLMs in generating equations as IMRs by reducing the tendency of generating constant expressions and programs. Our method improves the performance by 2.2%, 0.9%, and 1.7% on GSM8K, SVAMP, and Algebra datasets compared to the previous state-of-the-art methods under the single reasoning path setting. Our codes and prompts are released in https://github.com/zirui-HIT/Bridge_for_Numerical_Reasoning.

computational linguistic, equation, imr, (14 more...)

arXiv.org Artificial Intelligence

2308.10585

Country:

North America > Canada > Ontario > Toronto (0.05)
North America > Dominican Republic (0.04)
Europe > Italy > Tuscany > Florence (0.04)
(10 more...)

Genre: Research Report (0.84)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback